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DETAILED ACTION 
Claim Rejections - 35 USC § 101 

1. 35 U.S.C. 101 reads as follows: 

Whoever invents or discovers any new and useful process, machine, manufacture, or 
composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, 
subject to the conditions and requirements of this title. 

Claims 25-31 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non- 
statutory subject matter. Claims 25-31 are drawn to functional descriptive material NOT claimed as 
residing on a computer readable medium. MPEP 2106.IV.B.1(a) (Functional Descriptive Material) states: 

"Data structures not claimed as embodied in computer-readable media are descriptive material 
per se and are not statutory because they are not capable of causing functional change in the computer." 

"Such claimed data structures do not define any structural and functional interrelationships 
between the data structure and other claimed aspects of the invention which permit the data structure's 
functionality to be realized." 

Claims 25-31, while defining a "machine-readable medium", does not define a "computer- 
readable-medium" and is thus non-statutory for that reason. A "machine-readable medium" can range 
from paper on which the program is written, to a program simply contemplated and memorized by a 
person. Amending the claim to embody the program on "computer-readable-medium" in order to make 
the claim statutory is suggested. 

"In contrast, a claimed computer-readable medium encoded with a data structure defines 
structural and functional interrelationships between the data structure and the computer software and 
hardware components which permit the data structure's functionality to be realized, and is thus statutory." 
-MPEP2106.IV.B.1(a) 

Claim Rejections - 35 USC § 102 
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2. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for 
the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or 
in public use or on sale in this country, more than one year prior to the date of application for 
patent in the United States. 

3. Claims 1, 2, 4-10, 12-17, and 19-24 are rejected under 35 U.S.C. 102(b) as being anticipated by 
Bokser(U.S. Patent Number 4,773,099). 

1 . A method for automatic triage of a text passage outputted by an optical character 
recognition system, the OCR-output text passage having at least one OCR-output 
character, the method comprising: 

determining at least one OCR-output character attribute for each OCR-output character; 

Bokser discloses a method using an optical character recognition database to classify the 
unknown characters received (col. 4, 1. 20-22) it is then analyzed by the feature extraction module 
(Fig. 1) and the resulting features or attributes are quantified into vector format. 

determining an error rate for the OCR-output text passage using a triage model and the 
determined at least one OCR-output character attribute; 

Bokser's method, which can be used to classify singular characters or whole documents, 
calculates errors based on one or a plurality of test reference vectors and then assorts them 
according to quality, which is the definition of triage (Fig. 10B) 
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and comparing the determined error rate for the OCR-output text passage with an OCR- 
output text passage threshold error rate to perform an OCR-output text passage triage 
decision. 

Bokser's method then computes the standard deviation of the error values and sets a 
threshold or "confidence bound" (Fig. 10B). Thereafter comparing every character attribute 
calculated error with this threshold to determine if it in fact is identified or not and to what degree 
of confidence and groups them accordingly (Fig. 3A-3E). 

2. The method of claim 1, wherein determining an error rate for the OCR-output text 
passage comprises: 

providing the at least one OCR-output character attribute to the triage model; 

For each unknown character to be classified, the input to the classification module is a 
feature vector and geometry. The feature vector contains information defining selected features 
of the unknown character. Of importance, the technique of this invention operates properly 
regardless of how the feature vector is derived and what types of features are used to construct 
the feature vector (col. 24, 1. 7-14). 

determining a character interpretation error value for each OCR-output character based on 
a probability of the at least one OCR-output character attribute being erroneously 
interpreted by the system; 
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The classification module produces, for each unknown input character, a possibility set, 
which is a list of characters and associated confidences, which the unknown character might be 
(col. 23, 1.11-14). 

and determining a text passage error value based on the at least one character 
interpretation error value determined for each OCR-output character. 

Generally, the possibility set contains a single character candidate, but it is possible for 
this possibility set to contain no character candidates or more than one character candidate (coi. 
23, 1. 16-19). 

4. The method of claim 1, wherein determining at least one OCR-output character attribute 
for each OCR-output character comprises selecting the at least one OCR-output character 
attribute from a plurality of OCR-output character attributes. 

For each unknown character to be classified, the input to the classification module is a feature 
vector and geometry. The feature vector contains information defining selected features of the 
unknown character. Of importance, the technique of this invention operates properly regardless 
of how the feature vector is derived and what types of features are used to construct the feature 
vector (col. 24, 1. 7-14). 

5. The method of claim 4, wherein the plurality of OCR-output character attributes 
includes at least one of a character class, a confidence descriptor class, a language of the 
text passage, a text passage publication date, a typeface in which the text passage is 
printed, an image-based feature of an individual character image and metadata attached to 
the text passage. 



Application/Control Number: 10/064,435 Page 6 

Art Unit: 2621 

The geometry of the unknown character supplies the width and the height of the 
character, as defined by those "on" pixels forming the character, and subline information 
pertaining to the character (col. 24, 1. 21-24). 

6. The method of claim 1, wherein the text passage to be triaged includes at least one of 
pages, characters, words, phrases, text-lines, sentences, paragraphs, columns of text, 
blocks of text, text articles, multi-page documents, collections of single-page documents 
and collections of multi-page documents. 

The classification module produces, for each unknown input character (col. 23, L 11-12). 

7. The method of claim 1, wherein the OCR-output text passage triage decision includes at 
least one of sending the OCR-output text passage directly to an end user without post- 
OCR processing, sending the OCR-output text passage through a post-OCR inspection 
and processing stage, and sending the original text passage image to be keyed in 
manually. 

As can be seen in Fig. 12 it goes straight from computing the possibility set to outputting 
the possibility set. If the possibility set created contains more than one character candidate then, 
in one embodiment of this invention, the possibility set is sent on to other modules, such as a 
subline checker and context module, for post processing, so that only one character candidate 
remains in the possibility set after this post processing is complete. The possibility set created 
contains, in addition to a list of character candidates, a corresponding list of confidences, which 
can be used to flag characters, which were not recognized with certainty so that they can be 
examined by a word processing operator (col. 23, 1. 43-54). 
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8. The method of claim 1, wherein the triage model is a trained off-line triage model. 

In order to classify unknown input characters, first, during a preprocessing phase, a large 
number of reference data is collected and analyzed in order to form "ringed clusters" for each 
class of input data. For example, if the input data are characters, a set of ringed clusters is 
associated with each character class, such as all "e". These ringed clusters are formed so as to 
be later used during the classification of an unknown input character (col. 2, 1. 15-23). 

9. The method of claim 1, wherein the OCR-output text passage threshold error rate is a 
predetermined value. 

In one embodiment of this invention, course ringed clusters are used such that all data of 
a given class are associated with a single course ringed cluster. Later, during the classification of 
unknown input data, these course ringed clusters are used to eliminate possible reference data 
as being the unknown input data if the unknown input data does not fall within that course ringed 
cluster. In another embodiment of this invention, a set of medium ringed clusters is used for each 
class of reference data such that all data of a given class fall within the union of the set of 
medium ringed clusters. By using a set of medium ringed clusters, fewer "aliens" (data other than 
the selected data associated with the medium ringed clusters) are contained in the ringed 
clusters, thereby enhancing the ability to accurately recognize unknown input characters. Another 
embodiment of this invention uses fine ringed clusters in which there are no known aliens 
contained within the union of fine ringed clusters of each given reference data class. This allows 
even greater accuracy in classifying unknown data input characters. Another feature of this 
invention, ringed clusters include "certainty spheres" which are used to identify with certainty an 
unknown input character if it lies within such a certainty sphere. As another feature of this 
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invention, ringed clusters also include "confidence spheres" which are used to identify, although 
not with certainty, the unknown input character, and assign a confidence value indicating the 
relative confidence associated with the possibility that this unknown character corresponds to the 
reference data class of the ringed cluster (col. 2, 1. 13-56). 



10. The method of claim 7, wherein sending the OCR-output text passage through the 
post-OCR inspection and processing stage comprises: 

determining at least one text passage error probability value for each OCR-output text 
passage as a correction operator detects and corrects an error in the OCR-output text 
passage; 

A possibility set containing no character candidates, also referred to as a 'nonrecognition 
possibility sef , indicates that the input character simply has not been recognized. In one 
embodiment of this invention, segments corresponding to nonrecognition possibility sets are, if 
desired, sent on for further processing. For example, further modules can be used to split 
segments corresponding to two or more characters (a 'join'), to filter out noise, or to glue together 
pieces of a character which are currently broken apart into more than one data segment (a 
'split'). If no further processing is done on a segment corresponding to a nonrecognition 
possibility set, then this possibility set can, if desired, be sent on for post processing to, for 
example, a spelling corrector module which would use contextual information to replace the 
nonrecognition possibility set with a possibility set containing a single character candidate. In one 
embodiment of this invention, the final text output corresponding to a nonrecognition possibility 
set is the '@' symbol so that a word processing operator can find all nonrecognized characters 
after the optical character recognition process is complete. If the possibility set created contains 
more than one character candidate then, in one embodiment of this invention, the possibility set is 



Application/Control Number: 10/064,435 Page 9 

Art Unit: 2621 

sent on to other modules, such as a subline checker and context module, for post processing, so 
that only one character candidate remains in the possibility set after this post processing is 
complete. The possibility set created contains, in addition to a list of character candidates, a 
corresponding list of confidences, which can be used to flag characters, which were not 
recognized with certainty so that they can be examined by a word processing operator. The 
confidence values can also be used by the post processing modules described earlier to assist in 
choosing one of the character candidates (col. 23, 1. 20-57). 

and alerting the correction operator when the at least one text passage error probability 
value is improved so as to meet the OCR-output text passage threshold error value, 
wherein the text passage error probability value for each OCR-output text passage is 
based on a probability of the at least one OCR-output character attribute being 
erroneously interpreted by the system. 

The first step is to initialize the 'done' flag to 'FALSE'. If, at any point during the 
compute possibility set operation, the done flag is set to 'TRUE', then this indicates that the 
unknown character has been recognized with certainty and a possibility set has been created 
(col. 24&25, 1. 66-68 & 1-3). 

12. A computer-implemented method for triage of a plurality of OCR-output text passages, 
each OCR-output text passage having at least one OCR-output character, the method 
comprising: 

selecting a set of OCR-output character attributes from a plurality of OCR-output character 
attributes for each OCR-output character; 
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See Claim 1 first limitation. 

determining an OCR-output character error vaiue for each OCR-output character based on 
a probability of the set of OCR-output character attributes being erroneously interpreted 
by the OCR system; 

See Claim 1, second limitation. 

determining a text passage error value for each OCR-output text passage based on a 
probability of the text passage being erroneously interpreted by the OCR system as 
determined using at least the OCR-output character error values; 

The confidence value is based on the error value as can be seen in Fig. 10B thus high 
confidence low error as well as the converse. 

and comparing the determined text passage error value with an OCR-output text passage 
threshold error value to perform an OCR-output text passage triage decision. 

See Claim 1 third limitation. 

13. The computer-implemented method of claim 12, wherein the probability of the set of 
OCR-output character attributes being erroneously interpreted by the OCR system is 
determined based on at least the selected set of OCR-output character attributes 
processed using the triage model. 



See Claim 2 second limitation. 
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14. The computer-implemented method of claim 12, wherein the plurality of OCR-output 
character attributes includes at least one of a character class, a confidence descriptor 
class, a language of the text passage, a text passage publication date, a typeface in which 
the text passage is printed, an image-based feature of an individual character image and 
metadata attached to the text passage. 

See Claim 5. 

15. The computer-implemented method of claim 12, wherein the text passage to be triaged 
includes at least one of pages, characters, words, phrases, text-lines, sentences, 
paragraphs, columns of text, blocks of text, text articles, multi-page documents, 
collections of single-page documents and collections of multi-page documents. 

See Claim 6. 

16. The computer-implemented method of claim 12, wherein the OCR-output text passage 
triage decision includes at least one of sending the OCR-output text passage directly to an 
end user without post-OCR processing, sending the OCR-output text passage through a 
post-OCR inspection and processing stage, and sending the original text passage image 
to be keyed in manually. 

See Claim 7. 

17. The computer-implemented method of claim 16, wherein sending the OCR-output text 
passage through a post-OCR inspection and processing stage comprises: 
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determining at least one text passage error probability value for each OCR-output text 
passage as a correction operator detects and corrects an error in the OCR-output text 
passage; 

See Claim 10 first limitation. 

and alerting the correction operator when the at least one text passage error probability 
value is improved so as to meet the OCR-output text passage threshold error value, 
wherein the text passage error probability value for each OCR-output text passage is 
based on a probability of the at least one OCR-output character attribute being 
erroneously interpreted by the system. 

See Claim 10 second limitation. 

19. An OCR-output text passage triage system that triages a text passage outputted by an 
optical character recognition system, the OCR-output text passage including at least one 
OCR-output character having at least one OCR-output character attribute, the system 
comprising: 

an OCR-output text passage character accuracy determination circuit or routine that 
determines a character interpretation error value using a triage model; 

See Claim 1, second limitation. 

an OCR-output text passage accuracy determination circuit or routine that determines at 
least one OCR-output text passage quality metric using the determined character 
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interpretation error value and at least one statistical algorithm or model included in the 
triage model; 

A quality metric as defined by the applicant is a text passage error value represented as 
a probability, that the entire OCR output text passage is erroneously interpreted by the OCR 
system, which has been shown to be another representation of the "confidence" disclosed by 
Bokser and is itself a statistical algorithm. See Claim 1, second limitation. 

and an OCR-output text passage triage circuit or routine that performs one or more text 
passage triage decisions using the determined at least one OCR-output text passage 
quality metric and an OCR-output text passage threshold error rate value. 

See above and Claim 1, third limitation. 

20. The OCR-output text passage triage system of claim 19, wherein the triage model is a 
trained off-line triage model. 

See Claim 8. 

21. The OCR-output text passage triage system of claim 19, wherein the OCR-output text 
passage threshold error rate value is included in a text passage error threshold operating 
point model. 

Bokser's "confidence bound" model is synonymous with the applicant's "text passage 
error threshold operating point model", which is used to select a threshold operating point that 
will, with high confidence, satisfy customer-specified quality requirements while minimizing the 
labor needed to process document text passages that are not triaged. As can be seen in Bokser's 
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disclosure the "confidence bounds" are defined (col. 2, I. 13-56) and implemented (col. 23, I. 11- 
57) to define with their threshold operating points satisfy customer-specified quality requirements 
with high confidence typically 97% to be precise (col. 23, 1. 66). 

22. The OCR-output text passage triage system of claim 19, wherein the at least one OCR- 
output character attribute includes at least one of a character class, a confidence 
descriptor class, a language of the text passage, a text passage publication date, a 
typeface in which the text passage is printed, an image-based feature of an individual 
character image and metadata attached to the text passage. 

See Claim 5. 

23. The OCR-output text passage triage system of claim 19, wherein the text passage to 
be triaged includes at least one of pages, characters, words, phrases, text-lines, 
sentences, paragraphs, columns of text, blocks of text, text articles, multi-page 
documents, collections of single-page documents and collections of multi-page 
documents. 

See Claim 6. 

24. The OCR-output text passage triage system of claim 19, wherein the OCR-output text 
passage triage decision includes at least one of sending the OCR-output text passage 
directly to an end user without post-OCR rekeying or correction, sending the OCR-output 
text passage through a post-OCR inspection and correction stage, and sending the 
original text passage image to be completely keyed in manually. 



See Claim 7. 
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Claim Rejections - 35 USC § 103 

4. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness 
rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as 
set forth in section 102 of this title, if the differences between the subject matter sought to be 
patented and the prior art are such that the subject matter as a whole would have been obvious 
at the time the invention was made to a person having ordinary skill in the art to which said 
subject matter pertains. Patentability shall not be negatived by the manner in which the invention 
was made. 

5. Claims 3, 11, and 18 rejected under 35 U.S.C. 103(a) as being unpatentable over Bokser (U.S. 
Patent Number 4,773,099). 

3. The method of claim 2, further comprising: 

determining a number representing a sum of OCR-output characters in the OCR-output 
text passage; 

Since Bokser's possibility set is provided for each unknown character (col. 23, I. 11-12) 
and the applicant defines the text passage as being at least one character the sum is one. 

and dividing the text passage error value by the number representing the sum of OCR- 
output characters. 
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Dividing the confidences by the sum being one in this case would render the confidences 
unchanged. 

11. The method of claim 10, wherein determining the text passage error probability value 
for an OCR-output text passage comprises: 



determining OCR-output text passage error probability values for a plurality of selected 
portions of the OCR-output text passage; 

Bokser discloses a method that for each of the plurality of characters classified a 
possibility set which is a list of characters and associated confidences, which the unknown 
characters might be, are listed (col. 23, I. 12-14). 

and arranging the plurality of selected portions of the OCR-output text passage based on 
the determined OCR-output text passage error probability values such that the selected 
portions having the highest OCR-output text passage error probability values are 
displayed first to the correction operator. 

Though Bokser does not specifically disclose in what order the confidences are displayed 
it would be a design choice to display them in order of highest error or lowest confidence to 
lowest error or highest confidence. 

18. The computer-implemented method of claim 12, wherein determining a text passage 
error probability value for an OCR-output text passage comprises: 

determining OCR-output text passage error probability values for a plurality of selected 
portions of the OCR-output text passage; 
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Bokser discloses a method that for each of the plurality of characters classified a 
possibility set which is a list of characters and associated confidences, which the unknown 
characters might be, are listed (col. 23, 1. 12-14). 

and arranging the plurality of selected portions of the OCR-output text passage based on 
the determined OCR-output text passage error probability values such that the selected 
portions having the highest OCR-output text passage error probability values are 
displayed first to the correction operator. 

Though Bokser does not specifically disclose in what order the confidences are displayed 
it would be a design choice to display them in order of highest error or lowest confidence to 
lowest error or highest confidence. 

Conclusion 

The prior art made of record and not relied upon is considered pertinent to the applicant's 
disclosure. Hull et al. (U.S. Patent Number 5,465,653) discloses an image matching and retrieval system, 
which uses calculated errors in order to match images. Bares (U.S. Patent Number 5,057,936) discloses 
a copy quality monitoring system that uses background, foreground error calculations to determine the 
quality of the copies as compared to the originals. 

Any inquiry concerning this communication or earlier communications from the examiner should 
be directed to Jonathan C. Schaffer whose telephone number is (571) 272-0603. The examiner can 
normally be reached on 7:30am - 4:00pm. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, 
Joseph Mancuso can be reached on (571) 272-7695. The fax phone number for the organization where 
this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the Patent Application 
Information Retrieval (PAIR) system. Status information for published applications may be obtained from 
either Private PAIR or Public PAIR. Status information for unpublished applications is available through 
Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) 
at 866-217-9197 (toll-free). 
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